<html>
<head>
<link rel="shortcut icon" href="./favicon.ico">
<link rel="stylesheet" type="text/css" href="./style.css">
<link rel="canonical" href="./handshake.html">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="A set of rules for consistent and high-performance designs with ready/valid handshakes, and a discussion of the underlying synchronization mechanism.">
<title>Rules for Ready/Valid Handshakes</title>
</head>
<body>

<h1>Rules for Ready/Valid Handshakes and Synchronization</h1>

<p>from <a href="./index.html">FPGA Design Elements</a> by <a href="https://fpgacpu.ca/">Charles Eric LaForest, PhD.</a>

<p>Ready/valid handshakes are a flexible and lightweight way to connect and
control modules in composable ways, but as I designed more complex modules, I
found some corner cases I couldn't quite fit into the ready/valid handshake
model, and some designs started not composing well because I was implementing
the handshakes inconsistently. So I worked out a set of rules. I also found new
ways to distribute control logic, and I outline the synchronization process that
underlies ready/valid handshakes.

<h2>AXI is not Fundamental</h2>

<p>I originally based these rules on those written in the AMBA AXI4
Specification, Chapter A3, "<em>Single Interface Requirements</em>", and
expanded on their meaning and consequences. I have replaced the inappropriate
and inaccurate "master/slave" terminology with "source/destination",
correspondingly, when refering to handshake interfaces.

<p>Ready/valid handshakes are the underlying mechanism for AXI interfaces. Each
channel of AXI4 is a ready/valid interface, and the TLAST of AXI4-Stream is
merely a bit of metadata controlled by the ready/valid handshake. AXI4 and
AXI4-Stream interfaces bring in too many assumptions and complications for most
designs. It is simpler, smaller, and more flexible to compose your own
handshaking interfaces up from ready/valid primitives such as <a
href="./Pipeline_Skid_Buffer.html">Skid Buffers</a> and <a
href="./Pipeline_Merge_Round_Robin.html">Arbiters</a> and <a
href="./Pipeline_Fork_Blocking.html">Forks</a> and <a
href="./Pipeline_Join.html">Joins</a>, to only name a few.

<h2>Interfaces</h2>

<p>There are two complementary interface types: <em>source</em> and
<em>destination</em>. Both interfaces have three signals: valid, ready, and
data. Ready and valid are single-bit signals, and the data signal is of
arbitrary width. <b>All signals are synchronous to the rising edge of the
clock</b>.

<ul>
<li>The <em>source</em> interface outputs valid and data, and takes ready as input.
<li>The <em>destination</em> interface outputs ready, and receives valid and data.
</ul>

<p>A connection always goes from the source interface to the destination
interface.  Other pairings cannot work.

<h3>Loops</h3>

<p>There must be no combinational paths from input to output signals in the
source interface (ready to valid), nor in the destination interface (valid to
ready). Otherwise combinational loops will form when connecting interfaces.

<p>Even if only one type of interface has a loop, thus avoiding combinational
loops, the remaining combinational path will go from one interface to the other
and back to the first, which will cause a longer delay. This delay may both
limit your design clock frequency and make placement and routing of the modules
connected by these interfaces more difficult. <em>It is not worth saving a
cycle of latency here at the expense of the performance of the rest of the
design.</em> Proper handshake interface design and pipelining will avoid this
extra cycle of latency, while <em>improving</em> clock frequency and P&amp;R.

<p><b>ADDENDUM:</b> Although the above points stand in theory, in practice
allowing a combinational path between valid and ready within an interface can
be a good tradeoff.  Buffering every single source/destination connection can
bloat a design and will not improve performance except at the highest speeds.
Some pipeline control is most easily implemented combinationally between
interfaces, without buffering.  However, you must then be careful of
combinational loops and of the requirement to avoid livelocks and deadlocks
(see below).

<h2>Handshake Procedure</h2> 

<p>At any time, the source interface raises <em>and holds steady</em> valid
when data is available, and the destination interface raises ready only when it
can accept more data. When both valid and ready are high, the handshake is
complete, the destination interface accepts the data in the same cycle, and the
ready and valid outputs change state if necessary. 

<p>The destination interface can freely assert and deassert ready at any time.
However, it is beneficial to have the destination interface assert ready as
soon as it can accept data, before the source interface asserts valid, to
shorten handshakes to a single cycle. For the same reason, the source interface
should assert and hold steady valid as soon as it has data to send.

<p>After a handshake completes, the source interface drops valid if it has no
more data, otherwise it keeps valid high for the next handshake, which may
complete in the same clock cycle.  Similarly, the destination interface drops
ready if it cannot accept more data, otherwise it keeps ready high and can
complete the next handshake in the same clock cycle if valid is also high.

<h3>Delayed Handshakes</h3>

<p>It is possible to delay a handshake and have the destination interface
accept data at the moment the source interface asserts valid, then signal ready
to complete the handshake, after the accepted data is processed, to move the
source interface to the next data item. 

<p>Although this delayed handshake is legal and will work, it interferes with
conventional dataflow pipelining since the source interface cannot begin
processing and presenting the next data item concurrently with the destination
interface processing the current data item. If the source and destination
processing times are equal, which is the optimum when pipelining, this
incorrect handshake will halve the throughput instead of doubling it! 

<p>However, when full throughput pipelining is not needed or possible (e.g.: an
iterative calculation), a delayed handshake can implement a very useful control
mechanism by signalling to the source interface when the next item can be
processed. A <a href="./Pipeline_Half_Buffer.html">Half Buffer</a> implements
this control mechanism by accepting an item from the source interface, and
internally presenting that item and a valid signal at its own source interface.
Detecting the rise of valid with a <a href="./Pulse_Generator.html">Pulse
Generator</a> starts the internal logic, whose control logic now only need to
pulse ready when the calculation is done to complete the handshake. This
simplifies control and maintains concurrency.

<h3>Sampling Changing Data</h3>

<p>Typically, the source interface holds data steady alongside valid, changing
data only after the handshake completes. Any data value outside of the clock
cycle when the handshake completes is lost. It is allowable to have data be a
continuously changing value, synchronously to the clock, which will get sampled
whenever a handshake completes. 

<p>Sampling changing data is best done by having the source interface hold
valid high, while the destination interface asserts ready at an interval.  The
opposite method of sampling where the source interface periodically asserts
valid does not guarantee a predictable sampling interval, or cause a livelock,
since it depends on the destination interface having already asserted ready. It
may also cause a deadlock if the destination interface waits for valid before
raising ready (see below).

<h3>Avoiding Deadlocks and Livelocks</h3>

<p>We must constrain the behaviour of the valid and ready signals to prevent
deadlocks, where the source and destination interfaces wait forever for
eachother to respond, and to prevent livelocks, where both interfaces are
responding but the handshake never completes.

<p>To prevent deadlocks, the source interface must not wait until the
destination interface asserts ready before asserting valid, while the
destination interface can wait for the source interface to assert valid before
asserting ready. This waiting will extend the handshake to two cycles,
completing in the second cycle, but has applications where we want to
selectively complete handshakes from multiple source interfaces, as in an <a
href="./Pipeline_Merge_Priority.html">Arbiter</a>.

<p>To prevent livelocks, when the source interface asserts valid <em>it must
remain asserted until the handshake completes</em>, else we could end up in a
situation where the source interface temporarily asserts valid and the
destination interface temporarily asserts ready, but they never coincide to
complete the handshake.

<h2>Reset</h2>

<p>The reset signal to modules with source and/or destination interfaces can be
active high or low, may assert asynchronously, but must deassert synchronously.
However, using an active high reset limits the amount of inverted logic to
read, which keeps the logic clearer. <em>Also, I strongly recommend using a
fully synchronous reset since an asynchronous reset will inhibit register
retiming.</em> Any latch holding state inside the source or destination
interface must be reset, else the interface may remain in the wrong state after
reset (e.g. the source interface signalling that data is valid when there isn't
any).

<h2>Internal State</h2>

<p>Any internal state of a module which affects a source or destination
interface must only ever change in the same cycle as a <em>completed</em>
handshake, when both ready and valid are raised together. Otherwise, data may
be lost.

<p>For example, if you are counting down words in a packet passing through a
source interface, and you change state as soon as the counter reaches zero, you
will lose the last packet word if, by coincidence, the ready signal of the
destination interface goes low at the same time as the counter reaches zero. By
the time the destination interface raises ready, the source interface has
already moved on to the next packet and the previous data word has been lost,
corrupting the previous packet and all following packets.

<p>Thus, as a rule, define a line of code like this one:

<pre>
    always @(*) begin
        handshake_complete = (ready == 1'b1) && (valid == 1'b1);
    end
</pre>

and use it to gate any state transitions that affect a source or destination
interface.

<h2>Underlying Synchronization</h2>

<p>Since we very usefully partition the design of computating machinery as data
and control paths, it's natural to think of one module as transferring data to
another, and of one module as controlling the operation of another. The
ready/valid handshake is a great fit to both data transfers and module control.

<p>However, ready/valid handshakes are only a particular implementation of a
more fundamental mechanism of <em>synchronization</em> (a.k.a.:
<em>rendez-vous</em>), upon which we then build up any combination of data
transfer and control we need. This is another reason "master/slave" terminology
(and its analogs) should be discarded: it assumes constraints to the design
process which are not fundamental or always necessary. These constraints may be
useful in that they help us manage the complexity of a design, but at the cost
of an increased mismatch between the desired computation and its
implementation, which increases complexity and testing difficulty relative to a
more closely matched design.

<p>To illustrate this underlying synchronization mechanism, imagine two
modules which operate concurrently, but must sometimes synchronize with
eachother, for any number of possible reasons (e.g.: minimizing buffering,
preventing pipeline stalls, keeping audio and video streams synchronized).
Each module has an input named "OK_IN", and an output named "OK_OUT", and we
simply connect these outputs to those inputs as expected.

<p>When one module needs to synchronize, it raises and holds high "OK_OUT" and
waits. Eventually, the other module will do the same and both modules will
simultaneously see <code>(OK_IN == 1'b1) && (OK_OUT == 1'b1) == 1'b1</code>
during the same clock cycle, and their control logics can change state also
simultaneously, if not identically. The modules have synchronized, no data was
transferred, and neither one is in control of the other.

<p>(<em>This is similar, but not identical to passing both module outputs through a
<a href="./Pipeline_Synchronizer_Lazy.html">Pipeline Synchronizer</a>, as that
includes the downstream module(s), implementing a 3-way synchronization and
data transfer.</em>)

<p>If we then use this synchronization mechanism to transfer data in one
direction, we have re-invented ready/valid interfaces. We could also implement
a bidirectional simultaneous data transfer the same way, with the understanding
that both "OK" signals must follow the same deadlock and livelock avoidance
rules mentioned before.

<p>Finally, some readers will notice the similarity between this
synchronization mechanism and the 2-phase and 4-phase handshakes used in
asynchronous logic, only simplified since we have a clock as an absolute time
reference.

<hr><a href="./index.html">Back to FPGA Design Elements</a>
</body>
</html>

